4 research outputs found

    EXPLAINABLE FEATURE- AND DECISION-LEVEL FUSION

    Get PDF
    Information fusion is the process of aggregating knowledge from multiple data sources to produce more consistent, accurate, and useful information than any one individual source can provide. In general, there are three primary sources of data/information: humans, algorithms, and sensors. Typically, objective data---e.g., measurements---arise from sensors. Using these data sources, applications such as computer vision and remote sensing have long been applying fusion at different levels (signal, feature, decision, etc.). Furthermore, the daily advancement in engineering technologies like smart cars, which operate in complex and dynamic environments using multiple sensors, are raising both the demand for and complexity of fusion. There is a great need to discover new theories to combine and analyze heterogeneous data arising from one or more sources. The work collected in this dissertation addresses the problem of feature- and decision-level fusion. Specifically, this work focuses on fuzzy choquet integral (ChI)-based data fusion methods. Most mathematical approaches for data fusion have focused on combining inputs relative to the assumption of independence between them. However, often there are rich interactions (e.g., correlations) between inputs that should be exploited. The ChI is a powerful aggregation tool that is capable modeling these interactions. Consider the fusion of m sources, where there are 2m unique subsets (interactions); the ChI is capable of learning the worth of each of these possible source subsets. However, the complexity of fuzzy integral-based methods grows quickly, as the number of trainable parameters for the fusion of m sources scales as 2m. Hence, we require a large amount of training data to avoid the problem of over-fitting. This work addresses the over-fitting problem of ChI-based data fusion with novel regularization strategies. These regularization strategies alleviate the issue of over-fitting while training with limited data and also enable the user to consciously push the learned methods to take a predefined, or perhaps known, structure. Also, the existing methods for training the ChI for decision- and feature-level data fusion involve quadratic programming (QP). The QP-based learning approach for learning ChI-based data fusion solutions has a high space complexity. This has limited the practical application of ChI-based data fusion methods to six or fewer input sources. To address the space complexity issue, this work introduces an online training algorithm for learning ChI. The online method is an iterative gradient descent approach that processes one observation at a time, enabling the applicability of ChI-based data fusion on higher dimensional data sets. In many real-world data fusion applications, it is imperative to have an explanation or interpretation. This may include providing information on what was learned, what is the worth of individual sources, why a decision was reached, what evidence process(es) were used, and what confidence does the system have on its decision. However, most existing machine learning solutions for data fusion are black boxes, e.g., deep learning. In this work, we designed methods and metrics that help with answering these questions of interpretation, and we also developed visualization methods that help users better understand the machine learning solution and its behavior for different instances of data

    Online Learning of the Fuzzy Choquet Integral

    No full text
    The Choquet Integral (ChI) is an aggregation operator defined with respect to a Fuzzy Measure (FM). The FM encodes the worth of all subsets of the sources of information that are being aggregated. The monotonicity and the boundary conditions of the FM have limited its applicability to decision-level fusion. But in a recent work, we removed the boundary and monotonicity constraints of the FM, which we then called a bounded capacity (BC), to propose a Choquet Integral regression (CIR) approach that enables capability beyond previously proposed ChI regression methods. In the same work, we also presented a quadratic programming (QP)-based method, batch-CIR, to learn the BC parameters of the CIR from training data. However, the QP used for learning the BC scales exponentially with the dimensionality of the training data and thus it becomes impractical on data sets with 7 or more dimensions. In this paper we propose an iterative gradient descent approach, online-CIR, to learn the BC. This method iteratively processes the training data, one data point at a time, and therefore requires significantly less computation and space at any time during the training. The application of batch-CIR required the dimensionality reduction of high-dimensional data sets to enable computation in a reasonable time. The proposed online-CIR approach has enabled us to extend CIR to data sets with larger dimensionality. In our experimental evaluation using benchmark regression data sets, online-CIR has outperformed batch-CIR on high-dimensional data sets while also matching the batch-CIR performance on low-dimensional data sets

    Visualization and Analysis Tools for Explainable Choquet Integral Regression

    No full text
    The Choquet integral (ChI) is an aggregation operator defined with respect to a fuzzy measure (FM). The FM of a ChI encodes the worth of individual subsets of sources of information, and is an excellent tool for nonlinear aggregation. The monotonicity and boundary conditions of the FM limit the ChI to applications such as decision-level fusion. In a recent work, we removed the boundary and monotonicity constraints of the FM to propose a ChI-based regression (CIR) approach that enables capability beyond previously proposed ChI regression methods. However, the number of values in an FM scales as 2d, where d is the number of input sources. Thus, with such a large number of trained parameters, we tend to lose the explainability (or interpretability) of the learned solution-which comes readily with simpler methods like ordinary linear regression. In this paper, we enhance the explainability of CIR by extending our previously-proposed ChI visualization techniques to CIR. We also present a set of evaluation indices that quantitatively evaluate the importance of individual sources and interactions between groups of sources. We train CIR on real-world regression data sets, and the learned models are visualized and analyzed with the proposed methods. The demonstrations of the proposed visualizations and analyses are shown to significantly enhance the explainability of the learned CIR models

    Novel Regularization for Learning the Fuzzy Choquet Integral with Limited Training Data

    No full text
    Fuzzy integrals (FIs) are powerful aggregation operators that fuse information from multiple sources. The aggregation is parameterized using a fuzzy measure (FM), which encodes the worths of all subsets of sources. Since the FI is defined with respect to an FM, much consideration must be given to defining the FM. However, in practice this is a difficult task - the number of values in an FM scales as 2^n, where n is the number of input sources, thus manually specifying an FM quickly becomes tedious. In this article, we review an automatic, data-supported method of learning the FM by minimizing a sum-of-squared error objective function in the context of decision-level fusion of classifiers using the Choquet FI. While this solves the specification problem, we illuminate an issue encountered with many real-world data sets; i.e., if the training data do not contain a significant number of all possible sort orders, many of the FM values are not supported by the data. We propose various regularization strategies to alleviate this issue by pushing the learned FM toward a predefined structure; these regularizers allow the user to encode knowledge of the underlying FM to the learning problem. Furthermore, we propose another regularization strategy that constrains the learned FM\u27s structure to be a linear order statistic. Finally, we perform several experiments using synthetic and real-world data sets and show that our proposed extensions can improve the learned FM behavior and classification accuracy. A previously proposed visualization technique is employed to simultaneously quantitatively illustrate the FM as well as the FI
    corecore